[DRAFT] Add cache manager and package cache routing. by Xaenalt · Pull Request #1216 · python-wheel-build/fromager

Xaenalt · 2026-06-24T13:53:16Z

Don't bother reviewing till I address at least the AI comments
(Will squash once we're ready, I wanted to split the original 2 in case we want to bisect)

Pull Request Description

What

This adds two features:

One is a cache manager, that allows you to specify multiple remote caches.
Routes built and cache hit packages to different directories based on if they come from a top level dep or not. This lets us split the accelerated packages into their specific variant collection, and all transitive deps into the shared collection. This does require the user to specify which is which, but wiring it up based on shared libraries felt harder.

Why

This solves two longstanding problems:

Separation of caches for shared deps and top level accelerated packages
Pulling from multiple remote caches
PR follows CONTRIBUTING.md guidelines

AI summaries below:

Cache Manager:

    feat(cache): add unified CacheManager subsystem with remote PEP 503 support
    
    Introduces a new centralized cache management layer that replaces the
    ad-hoc per-directory lookups with a structured, hierarchical system.
    
    Key components:
    - `CacheManager` with collection-based lookup across local and remote backends
    - `RemotePEP503Backend` with lazy per-package fetching and session-scoped index
    - `StoreRouter` for directing artifacts to the correct collection
    - Short-circuit optimization in `_phase_prepare_source` that skips source
      download, build env, and build dep resolution on cache hits
    - `update_wheel_mirror` called after remote cache downloads so wheels are
      indexed for subsequent build dependency resolution via the internal server
    - CLI commands (`fromager cache list/stats/verify/invalidate/gc`)
    - `--use-cache-manager` and `--cache-wheel-server-url` bootstrap options

Cache Routing:

    feat(cache): route built wheels to variant or shared collection
    
    When the CacheManager is active and a non-cpu variant has top-level
    requirements, newly built wheels are routed to the appropriate
    collection directory:
    
    - Packages listed in the variant's requirements file → wheels-repo/variants/<variant>/
    - Unlisted transitive dependencies → wheels-repo/downloads/ (shared default)
    
    This ensures the variant collection contains exactly the packages
    explicitly requested in the requirements file, while common deps
    discovered during dependency resolution stay in the shared collection
    for reuse across variants.
    
    Changes:
    - `_build_cache_manager` accepts `toplevel_reqs` and uses them as the
      variant package set for store routing
    - `StoreRouter` routes based on variant_packages (from requirements
      file), not just pre-built packages
    - `_phase_build` calls `store_wheel` after building (not for cached hits)
    - `LocalDirectoryBackend.store()` copies (not moves) to preserve the
      original in downloads/ for the internal wheel server

coderabbitai · 2026-06-24T13:53:33Z

📝 Walkthrough

Walkthrough

This PR adds cache keys, backends, routing, stats, and manager orchestration. WorkContext gains an optional cache manager, SourceType gains CACHED, and bootstrap can build and use the manager for cached-wheel lookup and stored wheels. A new cache CLI group adds list, stats, verify, invalidate, and gc subcommands. Tests cover cache primitives, backends, routing, manager behavior, CLI flows, and bootstrap integration.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

🚥 Pre-merge checks | ✅ 4

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title matches the main change: introducing cache manager support and cache routing.
Description check	✅ Passed	The description directly describes the new cache manager and routing behavior in the changeset.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

Actionable comments posted: 6

🧹 Nitpick comments (1)

tests/test_cache.py (1)
1045-1103: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Assert same-run promotion to the local backend here.

This test proves the file was downloaded, but not that the CacheManager actually promoted it into local_backend for subsequent lookups. Adding a second lookup (or a direct local_backend.lookup(...) assertion) would catch regressions where remote hits never become local cache hits.
 Based on learnings and path instructions: "tests/**: Verify test actually tests the intended behavior. Check for missing edge cases."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_cache.py` around lines 1045 - 1103, The remote-hit test currently
only verifies the download result and does not confirm same-run promotion into
the local cache. Update test_remote_hit_downloads_to_local_store in CacheManager
flow to perform a second lookup or directly query local_backend after the first
remote hit, and assert that the item is now served from LocalDirectoryBackend as
a local hit. Use the existing symbols CacheManager.lookup_wheel, local_backend,
and RemotePEP503Backend to keep the test focused on promotion behavior.
Source: Path instructions

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/fromager/bootstrapper.py`:
- Around line 1535-1541: The wheel mirror update is happening before the routed
cache entry is stored, which can leave the internal index out of sync for the
newly copied path. In `bootstrapper.py`, reorder the flow in the cache-hit
handling around `server.update_wheel_mirror`, `self.ctx.package_build_info`, and
`self.ctx.cache.store_wheel` so the wheel is stored in the routed collection
first and the mirror is updated afterward. Keep the existing routing logic
intact, just move the mirror refresh to run after `store_wheel()` completes.

In `@src/fromager/cache.py`:
- Around line 664-667: After backend.fetch in the cache hit path, register the
downloaded artifact with collection.store_backend so its index is updated as a
local artifact instead of remaining remote-only. Use the existing cache-hit flow
around backend.fetch and the collection.store_backend object to add the
downloaded local_path back into the store manager in the same way promoted
artifacts are tracked, so live cache/admin commands can see it.
- Around line 454-463: Verify and enforce the advertised sha256 when downloading
a remote wheel in fetch(). After the session.get/iter_content write completes,
compute the file’s SHA256 for the downloaded target and compare it against
info.sha256 from ArtifactInfo; if it does not match, reject the wheel by
removing the cached file and raising an error. Keep the existing download flow
in fetch() but add the hash validation before returning target so only verified
wheels are accepted.
- Around line 504-535: The _parse_project_page() helper is accepting untrusted
anchor text as ArtifactInfo.filename, which later gets joined by fetch() when
writing to dest, so reject any remote wheel name containing path components or
traversal segments. Update _parse_project_page() to validate the parsed filename
before appending ArtifactInfo, and only allow a plain basename for remote
artifacts while keeping the existing URL and sha256 parsing logic intact.

In `@src/fromager/commands/cache_cmd.py`:
- Around line 92-99: The variant cache setup in cache_cmd.py is reusing the same
prebuilt_backend instance in both the variant and default collections, which
breaks collection scoping and causes double counting/mutating shared files.
Update the CacheCollection construction logic around wkctx.variant so the
variant collection only owns variant-specific backends, while prebuilt_backend
remains attached to the default collection and is still reachable via
search_order. Keep the unique symbols CacheCollection, variant_backends,
collections[wkctx.variant], and search_order aligned so cache list/stats and
invalidate --collection <variant> operate on separate backend instances.

In `@tests/test_bootstrapper_iterative.py`:
- Around line 1319-1368: The cache hit stats test is bypassing the real lookup
path by mocking _find_cached_wheel_via_manager, so CacheManager.lookup_wheel()
never executes and the hit event cannot be verified. Update
test_cache_hit_records_stats to exercise the actual manager lookup path by only
stubbing the unpack step (for example, _unpack_metadata_from_wheel) and then
assert the relevant stats counter/event produced by CacheManager/Bootstrapper.
If this test is only duplicating phase progression, rename or remove it so it
validates the intended stats behavior.

---

Nitpick comments:
In `@tests/test_cache.py`:
- Around line 1045-1103: The remote-hit test currently only verifies the
download result and does not confirm same-run promotion into the local cache.
Update test_remote_hit_downloads_to_local_store in CacheManager flow to perform
a second lookup or directly query local_backend after the first remote hit, and
assert that the item is now served from LocalDirectoryBackend as a local hit.
Use the existing symbols CacheManager.lookup_wheel, local_backend, and
RemotePEP503Backend to keep the test focused on promotion behavior.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: a423bd0f-a4ca-4a6b-892c-f98580939f94

📥 Commits

Reviewing files that changed from the base of the PR and between 694f04f and 28b92ad.

📒 Files selected for processing (9)

pyproject.toml
src/fromager/bootstrapper.py
src/fromager/cache.py
src/fromager/commands/bootstrap.py
src/fromager/commands/cache_cmd.py
src/fromager/context.py
src/fromager/requirements_file.py
tests/test_bootstrapper_iterative.py
tests/test_cache.py

coderabbitai

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/fromager/cache.py`:
- Line 599: The assignment in the cache setup is treating an explicit empty
variant_packages set as missing, so callers can unintentionally fall back to
accelerated_packages. Update the logic in the relevant cache initialization path
around self._variant_packages to distinguish None from an empty set, preserving
an explicitly provided empty variant_packages while still using
accelerated_packages only when variant_packages is not supplied.
- Around line 686-692: The cache lookup path in lookup_wheel/cache fetch
handling treats a fetch failure as fatal instead of falling through to the next
cache or build fallback. Update the logic around backend.fetch and the local
index store in cache.py so exceptions from unusable hits (HTTP errors, hash
mismatch, I/O failures) are caught and treated like a miss, allowing the loop
over backends to continue. Keep the successful-path registration with
collection.store_backend.store intact, but only after a confirmed fetch.
- Around line 366-369: The cache write path in store() currently skips copying
when a same-named artifact already exists, which can leave stale wheels on disk
while still treating the new build as stored. Update the logic around
Cache.store() and the dest check so existing entries are either verified before
reuse or overwritten with the newly built artifact before indexing, ensuring the
cache index always matches the actual file on disk.
- Around line 461-474: The download path in the cache handling logic should not
trust an existing wheel at target without validating it first, and it should not
leave partial files behind after a failed or interrupted write. Update the
cache/download flow around the existing target check and the write-and-hash
block in the cache module so that an existing file is SHA256-verified before
returning it, and any checksum mismatch or write failure removes the target
before raising. Use the existing download helper and the hasher/target logic to
keep validation and cleanup consistent.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 9a7ae0f7-3f32-4f7e-b0a2-36b4e2177ca6

📥 Commits

Reviewing files that changed from the base of the PR and between 28b92ad and c2e2a8e.

📒 Files selected for processing (6)

src/fromager/bootstrapper.py
src/fromager/cache.py
src/fromager/commands/bootstrap.py
src/fromager/commands/cache_cmd.py
tests/test_bootstrapper_iterative.py
tests/test_cache.py

🚧 Files skipped from review as they are similar to previous changes (5)

src/fromager/commands/bootstrap.py
src/fromager/commands/cache_cmd.py
tests/test_cache.py
tests/test_bootstrapper_iterative.py
src/fromager/bootstrapper.py

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

src/fromager/cache.py (1)
718-725: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Add the requirement logging context to this warning.

This warning is per-requirement but does not use req_ctxvar_context(), so it can lose the structured requirement context used elsewhere.

As per coding guidelines, “Use req_ctxvar_context() for per-requirement logging.”
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/fromager/cache.py` around lines 718 - 725, The cache hit warning in the
requirement fetch path is missing the structured per-requirement logging
context. Update the logging in the code path around the cache fetch warning to
run inside req_ctxvar_context() so the warning emitted from the
requirement-specific logic carries the same requirement context used elsewhere.
Keep the existing logger.warning message and symbols like req, version,
collection_name, backend.name, and err, but ensure the warning is emitted within
the req_ctxvar_context() wrapper.
Source: Coding guidelines

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/fromager/cache.py`:
- Around line 475-489: The temporary download path in the cache download flow is
deterministic, so parallel fetches can overwrite each other’s in-progress file.
Update the download logic in the cache helper that creates tmp_target and later
calls tmp_target.replace(target) to use a unique temp file under dest for each
download, then keep the atomic replace into target once the content and sha256
check succeed. Ensure the change stays localized to the existing
download-and-verify path so concurrent builders do not share the same temporary
filename.
- Around line 367-369: The cache write in the artifact copy path is not atomic,
so concurrent readers can see a partially written wheel. Update the logic in the
copy block that uses `self._directory`, `dest`, and `shutil.copy2` to write to a
unique temporary file in the same directory first, then atomically swap it into
place with `replace()`. Keep the existing freshness check around
`artifact.samefile(dest)`/`dest.exists()` and apply the atomic replace only when
the cached file needs updating.

---

Nitpick comments:
In `@src/fromager/cache.py`:
- Around line 718-725: The cache hit warning in the requirement fetch path is
missing the structured per-requirement logging context. Update the logging in
the code path around the cache fetch warning to run inside req_ctxvar_context()
so the warning emitted from the requirement-specific logic carries the same
requirement context used elsewhere. Keep the existing logger.warning message and
symbols like req, version, collection_name, backend.name, and err, but ensure
the warning is emitted within the req_ctxvar_context() wrapper.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 79f7fc32-8f30-49aa-bc6c-755c6c82cc77

📥 Commits

Reviewing files that changed from the base of the PR and between c2e2a8e and be8d8ae.

📒 Files selected for processing (1)

src/fromager/cache.py

mergify · 2026-06-26T21:14:49Z

This pull request has merge conflicts that must be resolved before it can be merged.
@Xaenalt please rebase your branch.

…upport Introduces a new centralized cache management layer that replaces the ad-hoc per-directory lookups with a structured, hierarchical system. Key components: - `CacheManager` with collection-based lookup across local and remote backends - `RemotePEP503Backend` with lazy per-package fetching and session-scoped index - `StoreRouter` for directing artifacts to the correct collection - Short-circuit optimization in `_phase_prepare_source` that skips source download, build env, and build dep resolution on cache hits - `update_wheel_mirror` called after remote cache downloads so wheels are indexed for subsequent build dependency resolution via the internal server - CLI commands (`fromager cache list/stats/verify/invalidate/gc`) - `--use-cache-manager` and `--cache-wheel-server-url` bootstrap options Co-Authored-By: Claude <claude@anthropic.com> Signed-off-by: Sean Pryor <spryor@redhat.com> Co-authored-by: Cursor <cursoragent@cursor.com>

When the CacheManager is active and a non-cpu variant has top-level requirements, newly built wheels are routed to the appropriate collection directory: - Packages listed in the variant's requirements file → wheels-repo/variants/<variant>/ - Unlisted transitive dependencies → wheels-repo/downloads/ (shared default) This ensures the variant collection contains exactly the packages explicitly requested in the requirements file, while common deps discovered during dependency resolution stay in the shared collection for reuse across variants. Changes: - `_build_cache_manager` accepts `toplevel_reqs` and uses them as the variant package set for store routing - `StoreRouter` routes based on variant_packages (from requirements file), not just pre-built packages - `_phase_build` calls `store_wheel` after building (not for cached hits) - `LocalDirectoryBackend.store()` copies (not moves) to preserve the original in downloads/ for the internal wheel server Co-Authored-By: Claude <claude@anthropic.com> Signed-off-by: Sean Pryor <spryor@redhat.com> Co-authored-by: Cursor <cursoragent@cursor.com>

- Verify SHA256 hash of remote wheel downloads, rejecting mismatches - Reject filenames with path traversal components from remote indices - Register remote cache hits in local store backend index after download - Reorder mirror update to run after store_wheel in cache hit path - Remove shared prebuilt_backend from variant collection (avoids double-counting) - Fix stats test to exercise real CacheManager.lookup_wheel path Co-Authored-By: Claude <claude@anthropic.com> Signed-off-by: Sean Pryor <spryor@redhat.com> Co-authored-by: Cursor <cursoragent@cursor.com>

- Use atomic tmp+replace for remote downloads to prevent partial files - Verify SHA256 of existing cached files before reusing them - Wrap fetch() in try/except so failures fall through to next backend - Fix store() to overwrite stale same-named files using samefile check - Fix StoreRouter treating empty variant_packages set as falsy Co-Authored-By: Claude <claude@anthropic.com> Signed-off-by: Sean Pryor <spryor@redhat.com> Co-authored-by: Cursor <cursoragent@cursor.com>

Use NamedTemporaryFile + replace() for both local store copies and remote downloads so parallel builders cannot observe partial files or collide on a shared deterministic temp path. Co-Authored-By: Claude <claude@anthropic.com> Signed-off-by: Sean Pryor <spryor@redhat.com> Co-authored-by: Cursor <cursoragent@cursor.com>

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (2)

tests/test_cache.py (2)
1143-1155: 🗄️ Data Integrity & Integration | 🔵 Trivial | ⚡ Quick win

Assert remote hits are registered in the local backend index.

The integration test verifies the file is downloaded, but not that collection.store_backend.store() updated the local index for later lookups/admin commands. Add an assertion against local_backend.lookup(key). As per path instructions, “Verify test actually tests the intended behavior. Check for missing edge cases.”
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_cache.py` around lines 1143 - 1155, The integration test for
manager.lookup_wheel only checks the downloaded wheel result and misses
verifying that a remote hit is recorded in the local backend index. After the
existing assertions, add an assertion using local_backend.lookup(key) to confirm
collection.store_backend.store() updated the local index for later lookups/admin
commands. Use the existing manager.lookup_wheel flow and the local_backend/key
symbols from this test to locate the right spot.
Source: Path instructions

566-592: 🔒 Security & Privacy | 🔵 Trivial | ⚡ Quick win

Cover the existing-file SHA mismatch path.

This test only proves that an existing file is reused when no hash is provided. Add the edge case where info.sha256 is present and the existing file is corrupt, so the re-download behavior cannot regress. As per path instructions, “Verify test actually tests the intended behavior. Check for missing edge cases.”
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_cache.py` around lines 566 - 592, The existing-file reuse test in
test_fetch_skips_existing_file only covers the no-hash case, so extend it to
include an ArtifactInfo with sha256 set and a preexisting corrupt file in the
destination. Update the assertions around RemotePEP503Backend.fetch so the
existing-file shortcut is only accepted when the file matches the expected
SHA256, and verify that a hash mismatch triggers re-download instead of
returning the stale path. Use the existing test_fetch_skips_existing_file,
RemotePEP503Backend.fetch, and ArtifactInfo symbols to keep the new case aligned
with the current behavior.
Source: Path instructions

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/fromager/cache.py`:
- Around line 324-335: Skip symlinked wheels during cache indexing in the logic
that scans self._directory in cache.py before building WheelCacheKey and
ArtifactInfo. Check each wheel_file from the glob and continue past any symlink
so url_or_path never points to a resolved target outside the cache root. Keep
the change localized to the wheel discovery loop used by invalidate/gc to
prevent later unlink operations from following symlinks.
- Around line 573-588: The URL handling in the artifact resolution logic still
allows plaintext http:// wheel links and accepts downloads without integrity
metadata. Update the URL resolution path in cache.py so the link parsing code
that builds url from href and strips sha256 fragments rejects any non-HTTPS
artifact URL or otherwise enforces that a valid sha256 is present before the
artifact is accepted. Make the check in the same flow that handles href
resolution and fragment parsing so absolute URLs cannot bypass the integrity
requirement.

In `@src/fromager/commands/cache_cmd.py`:
- Around line 357-398: Reject negative values for the gc command’s --keep-latest
option by constraining the Click parameter at the boundary in gc and/or the
`@click.option` definition. Update the keep_latest handling so only zero or
positive integers are accepted before the entries[keep_latest:] slicing logic
runs, preventing negative indexes from causing unintended cache deletions.

In `@tests/test_cache.py`:
- Around line 329-346: The test URLs in RemotePEP503Backend setup use
cache.example.com, but repository conventions require .test domains in tests.
Update the affected assertions and fixture-like instantiations in test_cache.py
to use cache.test instead, keeping the same RemotePEP503Backend cases such as
test_name_default, test_not_writable, and test_store_raises.

---

Nitpick comments:
In `@tests/test_cache.py`:
- Around line 1143-1155: The integration test for manager.lookup_wheel only
checks the downloaded wheel result and misses verifying that a remote hit is
recorded in the local backend index. After the existing assertions, add an
assertion using local_backend.lookup(key) to confirm
collection.store_backend.store() updated the local index for later lookups/admin
commands. Use the existing manager.lookup_wheel flow and the local_backend/key
symbols from this test to locate the right spot.
- Around line 566-592: The existing-file reuse test in
test_fetch_skips_existing_file only covers the no-hash case, so extend it to
include an ArtifactInfo with sha256 set and a preexisting corrupt file in the
destination. Update the assertions around RemotePEP503Backend.fetch so the
existing-file shortcut is only accepted when the file matches the expected
SHA256, and verify that a hash mismatch triggers re-download instead of
returning the stale path. Use the existing test_fetch_skips_existing_file,
RemotePEP503Backend.fetch, and ArtifactInfo symbols to keep the new case aligned
with the current behavior.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: f189df6f-a24c-47ee-9690-90fe700d225e

📥 Commits

Reviewing files that changed from the base of the PR and between be8d8ae and 4449d85.

📒 Files selected for processing (9)

pyproject.toml
src/fromager/bootstrapper.py
src/fromager/cache.py
src/fromager/commands/bootstrap.py
src/fromager/commands/cache_cmd.py
src/fromager/context.py
src/fromager/requirements_file.py
tests/test_bootstrapper_iterative.py
tests/test_cache.py

🚧 Files skipped from review as they are similar to previous changes (6)

src/fromager/requirements_file.py
pyproject.toml
src/fromager/context.py
src/fromager/commands/bootstrap.py
src/fromager/bootstrapper.py
tests/test_bootstrapper_iterative.py

coderabbitai · 2026-06-29T14:50:10Z

+        for wheel_file in self._directory.glob("*.whl"):
+            try:
+                name, version, build_tag, _ = parse_wheel_filename(wheel_file.name)
+                key = WheelCacheKey(
+                    package=name,
+                    version=version,
+                    build_tag=build_tag,
+                )
+                info = ArtifactInfo(
+                    filename=wheel_file.name,
+                    url_or_path=str(wheel_file.resolve()),
+                    size_bytes=wheel_file.stat().st_size,


🔒 Security & Privacy | 🟠 Major | ⚡ Quick win

Skip symlinked wheels before indexing local cache paths.

wheel_file.resolve() follows symlinks, and cache maintenance later unlinks ArtifactInfo.url_or_path. A symlinked *.whl inside the cache can make invalidate/gc delete a file outside the cache root.

Proposed fix

for wheel_file in self._directory.glob("*.whl"): try: + if wheel_file.is_symlink(): + logger.warning("skipping symlinked wheel file: %s", wheel_file.name) + continue name, version, build_tag, _ = parse_wheel_filename(wheel_file.name) key = WheelCacheKey( package=name, version=version, build_tag=build_tag, @@ info = ArtifactInfo( filename=wheel_file.name, - url_or_path=str(wheel_file.resolve()), + url_or_path=str(wheel_file.absolute()), size_bytes=wheel_file.stat().st_size, )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

for wheel_file in self._directory.glob("*.whl"):

try:

name, version, build_tag, _ = parse_wheel_filename(wheel_file.name)

key = WheelCacheKey(

package=name,

version=version,

build_tag=build_tag,

)

info = ArtifactInfo(

filename=wheel_file.name,

url_or_path=str(wheel_file.resolve()),

size_bytes=wheel_file.stat().st_size,

for wheel_file in self._directory.glob("*.whl"):

try:

if wheel_file.is_symlink():

logger.warning("skipping symlinked wheel file: %s", wheel_file.name)

continue

name, version, build_tag, _ = parse_wheel_filename(wheel_file.name)

key = WheelCacheKey(

package=name,

version=version,

build_tag=build_tag,

)

info = ArtifactInfo(

filename=wheel_file.name,

url_or_path=str(wheel_file.absolute()),

size_bytes=wheel_file.stat().st_size,

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/fromager/cache.py` around lines 324 - 335, Skip symlinked wheels during cache indexing in the logic that scans self._directory in cache.py before building WheelCacheKey and ArtifactInfo. Check each wheel_file from the glob and continue past any symlink so url_or_path never points to a resolved target outside the cache root. Keep the change localized to the wheel discovery loop used by invalidate/gc to prevent later unlink operations from following symlinks.

coderabbitai · 2026-06-29T14:50:11Z

+            # Resolve relative URLs
+            if href.startswith("http://") or href.startswith("https://"):
+                url = href
+            elif href.startswith("/"):
+                parsed = urlparse(base_url)
+                url = f"{parsed.scheme}://{parsed.netloc}{href}"
+            else:
+                url = base_url.rstrip("/") + "/" + href
+
+            # Strip hash fragment for the URL but extract sha256 if present
+            sha256 = None
+            if "#" in url:
+                url_part, fragment = url.rsplit("#", 1)
+                if fragment.startswith("sha256="):
+                    sha256 = fragment[7:]
+                url = url_part


🔒 Security & Privacy | 🟠 Major | ⚡ Quick win

Reject plaintext artifact downloads without integrity metadata.

A HTTPS index can still advertise an absolute http:// wheel URL, and wheels without sha256 are accepted after download. Either reject plaintext artifact URLs entirely or require a valid hash before accepting them.

Proposed fix

if "#" in url: url_part, fragment = url.rsplit("#", 1) if fragment.startswith("sha256="): sha256 = fragment[7:] url = url_part + + if urlparse(url).scheme == "http" and not sha256: + logger.warning( + "skipping insecure remote artifact without sha256: %s", url + ) + continue

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# Resolve relative URLs

if href.startswith("http://") or href.startswith("https://"):

url = href

elif href.startswith("/"):

parsed = urlparse(base_url)

url = f"{parsed.scheme}://{parsed.netloc}{href}"

else:

url = base_url.rstrip("/") + "/" + href

# Strip hash fragment for the URL but extract sha256 if present

sha256 = None

if "#" in url:

url_part, fragment = url.rsplit("#", 1)

if fragment.startswith("sha256="):

sha256 = fragment[7:]

url = url_part

# Resolve relative URLs

if href.startswith("http://") or href.startswith("https://"):

url = href

elif href.startswith("/"):

parsed = urlparse(base_url)

url = f"{parsed.scheme}://{parsed.netloc}{href}"

else:

url = base_url.rstrip("/") + "/" + href

# Strip hash fragment for the URL but extract sha256 if present

sha256 = None

if "#" in url:

url_part, fragment = url.rsplit("#", 1)

if fragment.startswith("sha256="):

sha256 = fragment[7:]

url = url_part

if urlparse(url).scheme == "http" and not sha256:

logger.warning(

"skipping insecure remote artifact without sha256: %s", url

)

continue

🧰 Tools

🪛 ast-grep (0.44.0)

[warning] 573-573: Do not make http calls without encryption
Context: "http://"
Note: [CWE-319] Cleartext Transmission of Sensitive Information.

(requests-http)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/fromager/cache.py` around lines 573 - 588, The URL handling in the artifact resolution logic still allows plaintext http:// wheel links and accepts downloads without integrity metadata. Update the URL resolution path in cache.py so the link parsing code that builds url from href and strips sha256 fragments rejects any non-HTTPS artifact URL or otherwise enforces that a valid sha256 is present before the artifact is accepted. Make the check in the same flow that handles href resolution and fragment parsing so absolute URLs cannot bypass the integrity requirement.

Source: Linters/SAST tools

coderabbitai · 2026-06-29T14:50:11Z

+@click.option(
+    "--keep-latest",
+    type=int,
+    default=1,
+    help="Keep this many build tags per package+version (default: 1).",
+)
+@click.pass_obj
+def gc(
+    wkctx: context.WorkContext,
+    dry_run: bool,
+    keep_latest: int,
+) -> None:
+    """Garbage-collect old builds, keeping only the latest build tags.
+
+    For each package+version, removes all but the --keep-latest most
+    recent builds (highest build tag number).
+    """
+    manager = _build_cache_manager(wkctx)
+    removed = 0
+    freed_bytes = 0
+
+    for _coll_name, coll in manager.collections.items():
+        for backend in coll.backends:
+            if not isinstance(backend, LocalDirectoryBackend):
+                continue
+
+            # Group by (package, version)
+            groups: dict[tuple, list] = {}
+            for key, info in backend._index.items():
+                group_key = (key.package, key.version)
+                groups.setdefault(group_key, []).append((key, info))
+
+            for _group_key, entries in groups.items():
+                if len(entries) <= keep_latest:
+                    continue
+
+                # Sort by build tag number descending
+                entries.sort(
+                    key=lambda e: e[0].build_tag[0] if e[0].build_tag else 0,
+                    reverse=True,
+                )
+                to_remove = entries[keep_latest:]


🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

Reject negative --keep-latest values.

entries[keep_latest:] makes negative values behave unpredictably; large negative values can select every cached build for deletion. Constrain the option at the Click boundary.

Proposed fix

`@click.option`( "--keep-latest", - type=int, + type=click.IntRange(min=0), default=1, help="Keep this many build tags per package+version (default: 1).", )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

@click.option(

"--keep-latest",

type=int,

default=1,

help="Keep this many build tags per package+version (default: 1).",

)

@click.pass_obj

def gc(

wkctx: context.WorkContext,

dry_run: bool,

keep_latest: int,

) -> None:

"""Garbage-collect old builds, keeping only the latest build tags.

For each package+version, removes all but the --keep-latest most

recent builds (highest build tag number).

"""

manager = _build_cache_manager(wkctx)

removed = 0

freed_bytes = 0

for _coll_name, coll in manager.collections.items():

for backend in coll.backends:

if not isinstance(backend, LocalDirectoryBackend):

continue

# Group by (package, version)

groups: dict[tuple, list] = {}

for key, info in backend._index.items():

group_key = (key.package, key.version)

groups.setdefault(group_key, []).append((key, info))

for _group_key, entries in groups.items():

if len(entries) <= keep_latest:

continue

# Sort by build tag number descending

entries.sort(

key=lambda e: e[0].build_tag[0] if e[0].build_tag else 0,

reverse=True,

)

to_remove = entries[keep_latest:]

`@click.option`(

"--keep-latest",

type=click.IntRange(min=0),

default=1,

help="Keep this many build tags per package+version (default: 1).",

)

`@click.pass_obj`

def gc(

wkctx: context.WorkWorkContext,

dry_run: bool,

keep_latest: int,

) -> None:

"""Garbage-collect old builds, keeping only the latest build tags.

For each package+version, removes all but the --keep-latest most

recent builds (highest build tag number).

"""

manager = _build_cache_manager(wkctx)

removed = 0

freed_bytes = 0

for _coll_name, coll in manager.collections.items():

for backend in coll.backends:

if not isinstance(backend, LocalDirectoryBackend):

continue

# Group by (package, version)

groups: dict[tuple, list] = {}

for key, info in backend._index.items():

group_key = (key.package, key.version)

groups.setdefault(group_key, []).append((key, info))

for _group_key, entries in groups.items():

if len(entries) <= keep_latest:

continue

# Sort by build tag number descending

entries.sort(

key=lambda e: e[0].build_tag[0] if e[0].build_tag else 0,

reverse=True,

)

to_remove = entries[keep_latest:]

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/fromager/commands/cache_cmd.py` around lines 357 - 398, Reject negative values for the gc command’s --keep-latest option by constraining the Click parameter at the boundary in gc and/or the `@click.option` definition. Update the keep_latest handling so only zero or positive integers are accepted before the entries[keep_latest:] slicing logic runs, preventing negative indexes from causing unintended cache deletions.

coderabbitai · 2026-06-29T14:50:11Z

+    def test_name_default(self) -> None:
+        backend = RemotePEP503Backend(
+            server_url="https://cache.example.com/simple",
+            download_dir=pathlib.Path("/tmp/downloads"),
+        )
+        assert backend.name == "remote:https://cache.example.com/simple"
+
+    def test_not_writable(self) -> None:
+        backend = RemotePEP503Backend(
+            server_url="https://cache.example.com/simple",
+            download_dir=pathlib.Path("/tmp/downloads"),
+        )
+        assert backend.writable is False
+
+    def test_store_raises(self) -> None:
+        backend = RemotePEP503Backend(
+            server_url="https://cache.example.com/simple",
+            download_dir=pathlib.Path("/tmp/downloads"),


📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Use .test domains in these test URLs.

These are test-only URLs, so use cache.test instead of cache.example.com to match the repository convention. As per coding guidelines, tests/**/*.py: “Use the .test TLD … instead of example.com in tests”.

🧰 Tools

🪛 ast-grep (0.44.0)

[info] 331-331: Do not hardcode temporary file or directory names
Context: "/tmp/downloads"
Note: [CWE-377] Insecure Temporary File.

(hardcoded-tmp-file)

[info] 338-338: Do not hardcode temporary file or directory names
Context: "/tmp/downloads"
Note: [CWE-377] Insecure Temporary File.

(hardcoded-tmp-file)

[info] 345-345: Do not hardcode temporary file or directory names
Context: "/tmp/downloads"
Note: [CWE-377] Insecure Temporary File.

(hardcoded-tmp-file)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_cache.py` around lines 329 - 346, The test URLs in RemotePEP503Backend setup use cache.example.com, but repository conventions require .test domains in tests. Update the affected assertions and fixture-like instantiations in test_cache.py to use cache.test instead, keeping the same RemotePEP503Backend cases such as test_name_default, test_not_writable, and test_store_raises.

Source: Coding guidelines

Xaenalt requested a review from a team as a code owner June 24, 2026 13:53

Xaenalt added the enhancement New feature or request label Jun 24, 2026

mergify Bot added the ci label Jun 24, 2026

coderabbitai Bot reviewed Jun 24, 2026

View reviewed changes

Comment thread src/fromager/bootstrapper.py Outdated

Comment thread src/fromager/cache.py

Comment thread src/fromager/cache.py

Comment thread src/fromager/cache.py Outdated

Comment thread src/fromager/commands/cache_cmd.py Outdated

Comment thread tests/test_bootstrapper_iterative.py

Xaenalt force-pushed the feat/cache-manager branch from 28b92ad to c2e2a8e Compare June 25, 2026 14:48

coderabbitai Bot reviewed Jun 25, 2026

View reviewed changes

Comment thread src/fromager/cache.py Outdated

Comment thread src/fromager/cache.py Outdated

Comment thread src/fromager/cache.py Outdated

Comment thread src/fromager/cache.py Outdated

Xaenalt force-pushed the feat/cache-manager branch from 0cab1ce to be8d8ae Compare June 26, 2026 05:01

coderabbitai Bot reviewed Jun 26, 2026

View reviewed changes

Comment thread src/fromager/cache.py Outdated

Comment thread src/fromager/cache.py Outdated

Xaenalt and others added 5 commits June 29, 2026 10:42

Xaenalt force-pushed the feat/cache-manager branch from 7c4cbf0 to 4449d85 Compare June 29, 2026 14:45

coderabbitai Bot reviewed Jun 29, 2026

View reviewed changes

Uh oh!

Conversation

Xaenalt commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Description

What

Why

Uh oh!

coderabbitai Bot commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

mergify Bot commented Jun 26, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Xaenalt commented Jun 24, 2026 •

edited

Loading

coderabbitai Bot commented Jun 24, 2026 •

edited

Loading